Goto

Collaborating Authors

 transient iteration complexity



74e1ed8b55ea44fd7dbb685c412568a4-Supplemental.pdf

Neural Information Processing Systems

Thisboundisattainedif nisanevennumber, λn/2 isthatdesiredeigenvalue.Basedonthenumerical experiment, we know it ifn is an odd number, this bound cannot be attained. The ring topology is undirected, and is illustrated in Figure 1(a). The star topology is undirected, and is illustrated in Figure 1(b). Its weight matrix is generated according totheMetropolis rule,which issymmetric. The 2D-grid topology is undirected, and is illustrated in Figure 1(c).




A Static Exponential Graph

Neural Information Processing Systems

Illustration of the 6 -node static exponential graph and its associated weight matrix. Transform (DFT) and its connection to circulant matrix, which plays the critical role in the proof. Use the conjugate argument then apply the similar procedure as i = 1 . The shape of the 6-node topologies discussed in Sec. The ring topology is undirected, and is illustrated in Figure 1(a).



SPARKLE: A Unified Single-Loop Primal-Dual Framework for Decentralized Bilevel Optimization

Zhu, Shuchen, Kong, Boao, Lu, Songtao, Huang, Xinmeng, Yuan, Kun

arXiv.org Machine Learning

This paper studies decentralized bilevel optimization, in which multiple agents collaborate to solve problems involving nested optimization structures with neighborhood communications. Most existing literature primarily utilizes gradient tracking to mitigate the influence of data heterogeneity, without exploring other well-known heterogeneity-correction techniques such as EXTRA or Exact Diffusion. Additionally, these studies often employ identical decentralized strategies for both upper- and lower-level problems, neglecting to leverage distinct mechanisms across different levels. To address these limitations, this paper proposes SPARKLE, a unified Single-loop Primal-dual AlgoRithm frameworK for decentraLized bilEvel optimization. SPARKLE offers the flexibility to incorporate various heterogeneitycorrection strategies into the algorithm. Moreover, SPARKLE allows for different strategies to solve upper- and lower-level problems. We present a unified convergence analysis for SPARKLE, applicable to all its variants, with state-of-the-art convergence rates compared to existing decentralized bilevel algorithms. Our results further reveal that EXTRA and Exact Diffusion are more suitable for decentralized bilevel optimization, and using mixed strategies in bilevel algorithms brings more benefits than relying solely on gradient tracking.


Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic Update and Transient Iteration Complexity

Kong, Boao, Zhu, Shuchen, Lu, Songtao, Huang, Xinmeng, Yuan, Kun

arXiv.org Artificial Intelligence

Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. However, current decentralized SBO algorithms face challenges, including expensive inner-loop updates and unclear understanding of the influence of network topology, data heterogeneity, and the nested bilevel algorithmic structures. In this paper, we introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient iteration complexity, which, for the first time, clarifies the joint influence of network topology and data heterogeneity on decentralized bilevel algorithms. D-SOBA achieves the state-of-the-art asymptotic rate, asymptotic gradient/Hessian complexity, and transient iteration complexity under more relaxed assumptions compared to existing methods. Numerical experiments validate our theoretical findings.